246 research outputs found

    Health record hiccups—5,526 real-world time series with change points labelled by crowdsourced visual inspection

    Get PDF
    Background: Large routinely collected data such as electronic health records (EHRs) are increasingly used in research, but the statistical methods and processes used to check such data for temporal data quality issues have not moved beyond manual, ad hoc production and visual inspection of graphs. With the prospect of EHR data being used for disease surveillance via automated pipelines and public-facing dashboards, automation of data quality checks will become increasingly valuable. / Findings: We generated 5,526 time series from 8 different EHR datasets and engaged >2,000 citizen-science volunteers to label the locations of all suspicious-looking change points in the resulting graphs. Consensus labels were produced using density-based clustering with noise, with validation conducted using 956 images containing labels produced by an experienced data scientist. Parameter tuning was done against 670 images and performance calculated against 286 images, resulting in a final sensitivity of 80.4% (95% CI, 77.1%–83.3%), specificity of 99.8% (99.7%–99.8%), positive predictive value of 84.5% (81.4%–87.2%), and negative predictive value of 99.7% (99.6%–99.7%). In total, 12,745 change points were found within 3,687 of the time series. / Conclusions: This large collection of labelled EHR time series can be used to validate automated methods for change point detection in real-world settings, encouraging the development of methods that can successfully be applied in practice. It is particularly valuable since change point detection methods are typically validated using synthetic data, so their performance in real-world settings cannot be assumed to be comparable. While the dataset focusses on EHRs and data quality, it should also be applicable in other fields

    Evaluation of methods for detecting human reads in microbial sequencing datasets

    Get PDF
    Sequencing data from host-associated microbes can often be contaminated by the body of the investigator or research subject. Human DNA is typically removed from microbial reads either by subtractive alignment (dropping all reads that map to the human genome) or by using a read classification tool to predict those of human origin, and then discarding them. To inform best practice guidelines, we benchmarked eight alignment-based and two classification-based methods of human read detection using simulated data from 10 clinically prevalent bacteria and three viruses, into which contaminating human reads had been added. While the majority of methods successfully detected >99 % of the human reads, they were distinguishable by variance. The most precise methods, with negligible variance, were Bowtie2 and SNAP, both of which misidentified few, if any, bacterial reads (and no viral reads) as human. While correctly detecting a similar number of human reads, methods based on taxonomic classification, such as Kraken2 and Centrifuge, could misclassify bacterial reads as human, although the extent of this was species-specific. Among the most sensitive methods of human read detection was BWA, although this also made the greatest number of false positive classifications. Across all methods, the set of human reads not identified as such, although often representing 300 bp) bacterial reads, the highest performing approaches were classification-based, using Kraken2 or Centrifuge. For shorter (c. 150 bp) bacterial reads, combining multiple methods of human read detection maximized the recovery of human reads from contaminated short read datasets without being compromised by false positives. A particularly high-performance approach with shorter bacterial reads was a two-stage classification using Bowtie2 followed by SNAP. Using this approach, we re-examined 11 577 publicly archived bacterial read sets for hitherto undetected human contamination. We were able to extract a sufficient number of reads to call known human SNPs, including those with clinical significance, in 6 % of the samples. These results show that phenotypically distinct human sequence is detectable in publicly archived microbial read datasets

    Short-term genome stability of serial Clostridium difficile ribotype 027 isolates in an experimental gut model and recurrent human disease

    Get PDF
    Copyright: © 2013 Eyre et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are creditedClostridium difficile whole genome sequencing has the potential to identify related isolates, even among otherwise indistinguishable strains, but interpretation depends on understanding genomic variation within isolates and individuals.Serial isolates from two scenarios were whole genome sequenced. Firstly, 62 isolates from 29 timepoints from three in vitro gut models, inoculated with a NAP1/027 strain. Secondly, 122 isolates from 44 patients (2–8 samples/patient) with mostly recurrent/on-going symptomatic NAP-1/027 C. difficile infection. Reference-based mapping was used to identify single nucleotide variants (SNVs).Across three gut model inductions, two with antibiotic treatment, total 137 days, only two new SNVs became established. Pre-existing minority SNVs became dominant in two models. Several SNVs were detected, only present in the minority of colonies at one/two timepoints. The median (inter-quartile range) [range] time between patients’ first and last samples was 60 (29.5–118.5) [0–561] days. Within-patient C. difficile evolution was 0.45 SNVs/called genome/year (95%CI 0.00–1.28) and within-host diversity was 0.28 SNVs/called genome (0.05–0.53). 26/28 gut model and patient SNVs were non-synonymous, affecting a range of gene targets.The consistency of whole genome sequencing data from gut model C. difficile isolates, and the high stability of genomic sequences in isolates from patients, supports the use of whole genome sequencing in detailed transmission investigations.Peer reviewe

    Hybrid Vibrio vulnificus

    Get PDF
    Hybridization between natural populations of Vibrio vulnificus results in hyperinvasive clone

    Mortality risks associated with empirical antibiotic activity in E. coli bacteraemia: an analysis of electronic health records

    Get PDF
    Background: Reported bacteraemia outcomes following inactive empirical antibiotics (based on in vitro testing) are conflicting, potentially reflecting heterogeneity in causative species, MIC breakpoints defining resistance/susceptibility, and times to rescue therapy. Methods: We investigated adult inpatients with Escherichia coli bacteraemia at Oxford University Hospitals, UK, from 4 February 2014 to 30 June 2021 who were receiving empirical amoxicillin/clavulanate with/without other antibiotics. We used Cox regression to analyse 30 day all-cause mortality by in vitro amoxicillin/clavulanate susceptibility (activity) using the EUCAST resistance breakpoint (>8/2 mg/L), categorical MIC, and a higher resistance breakpoint (>32/2 mg/L), adjusting for other antibiotic activity and confounders including comorbidities, vital signs and blood tests. Results: A total of 1720 E. coli bacteraemias (1626 patients) were treated with empirical amoxicillin/clavulanate. Thirty-day mortality was 193/1400 (14%) for any active baseline therapy and 52/320 (16%) for inactive baseline therapy (P = 0.17). With EUCAST breakpoints, there was no evidence that mortality differed for inactive versus active amoxicillin/clavulanate [adjusted HR (aHR) = 1.27 (95% CI 0.83–1.93); P = 0.28], nor of an association with active aminoglycoside (P = 0.93) or other active antibiotics (P = 0.18). Considering categorical amoxicillin/clavulanate MIC, MICs > 32/2 mg/L were associated with mortality [aHR = 1.85 versus MIC = 2/2 mg/L (95% CI 0.99–3.73); P = 0.054]. A higher resistance breakpoint (>32/2 mg/L) was independently associated with higher mortality [aHR = 1.82 (95% CI 1.07–3.10); P = 0.027], as were MICs > 32/2 mg/L with active empirical aminoglycosides [aHR = 2.34 (95% CI 1.40–3.89); P = 0.001], but not MICs > 32/2 mg/L with active non-aminoglycoside antibiotic(s) [aHR = 0.87 (95% CI 0.40–1.89); P = 0.72]. Conclusions: We found no evidence that EUCAST-defined amoxicillin/clavulanate resistance was associated with increased mortality, but a higher resistance breakpoint (MIC > 32/2 mg/L) was. Additional active baseline non-aminoglycoside antibiotics attenuated amoxicillin/clavulanate resistance-associated mortality, but aminoglycosides did not. Granular phenotyping and comparison with clinical outcomes may improve AMR breakpoints

    The impact of sequencing depth on the inferred taxonomic composition and AMR gene content of metagenomic samples

    Get PDF
    Shotgun metagenomics is increasingly used to characterise microbial communities, particularly for the investigation of antimicrobial resistance (AMR) in different animal and environmental contexts. There are many different approaches for inferring the taxonomic composition and AMR gene content of complex community samples from shotgun metagenomic data, but there has been little work establishing the optimum sequencing depth, data processing and analysis methods for these samples. In this study we used shotgun metagenomics and sequencing of cultured isolates from the same samples to address these issues. We sampled three potential environmental AMR gene reservoirs (pig caeca, river sediment, effluent) and sequenced samples with shotgun metagenomics at high depth (~ 200 million reads per sample). Alongside this, we cultured single-colony isolates of Enterobacteriaceae from the same samples and used hybrid sequencing (short- and long-reads) to create high- quality assemblies for comparison to the metagenomic data. To automate data processing, we developed an open- source software pipeline, ‘ResPipe’

    Decline of meticillin-resistant Staphylococcus aureus in Oxfordshire hospitals is strain-specific and preceded infection-control intensification

    Get PDF
    Background In the past, strains of Staphylococcus aureus have evolved, expanded, made a marked clinical impact and then disappeared over several years. Faced with rising meticillin-resistant S aureus (MRSA) rates, UK government-supported infection control interventions were rolled out in Oxford Radcliffe Hospitals NHS Trust from 2006 onwards. Methods Using an electronic Database, the authors identified isolation of MRS among 611 434 hospital inpatients admitted to acute hospitals in Oxford, UK, 1 April 1998 to 30 June 2010. Isolation rates were modelled using segmented negative binomial regression for three groups of isolates: from blood cultures, from samples suggesting invasion (eg, cerebrospinal fluid, joint fluid, pus samples) and from surface swabs (eg, from wounds). Findings MRSA isolation rates rose rapidly from 1998 to the end of 2003 (annual increase from blood cultures 23%, 95% CI 16% to 30%), and then declined. The decline accelerated from mid-2006 onwards (annual decrease post-2006 38% from blood cultures, 95% CI 29% to 45%, p=0.003 vs previous decline). Rates of meticillin-sensitive S aureus changed little by comparison, with no evidence for declines 2006 onward (p=0.40); by 2010, sensitive S aureus was far more common than MRSA (blood cultures: 2.9 vs 0.25; invasive samples 14.7 vs 2.0 per 10 000 bedstays). Interestingly, trends in isolation of erythromycin-sensitive and resistant MRSA differed. Erythromycin-sensitive strains rose significantly faster (eg, from blood cultures p=0.002), and declined significantly more slowly (p=0.002), than erythromycin-resistant strains (global p<0.0001). Bacterial typing suggests this reflects differential spread of two major UK MRSA strains (ST22/36), ST36 having declined markedly 2006-2010, with ST22 becoming the dominant MRSA strain. Conclusions MRSA isolation rates were falling before recent intensification of infection-control measures. This, together with strain-specific changes in MRSA isolation, strongly suggests that incompletely understood biological factors are responsible for the much recent variation in MRSA isolation. A major, mainly meticillin-sensitive, S aureus burden remains

    Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism-calling pipelines

    Get PDF
    Background: Accurately identifying SNPs from bacterial sequencing data is an essential requirement for using genomics to track transmission and predict important phenotypes such as antimicrobial resistance. However, most previous performance evaluations of SNP calling have been restricted to eukaryotic (human) data. Additionally, bacterial SNP calling requires choosing an appropriate reference genome to align reads to, which, together with the bioinformatic pipeline, affects the accuracy and completeness of a set of SNP calls obtained. This study evaluates the performance of 209 SNP calling pipelines using a combination of simulated data from 254 strains of 10 clinically common bacteria and real data from environmentally-sourced and genomically diverse isolates within the genera Citrobacter, Enterobacter, Escherichia and Klebsiella. Results: We evaluated the performance of 209 SNP calling pipelines, aligning reads to genomes of the same or a divergent strain. Irrespective of pipeline, a principal determinant of reliable SNP calling was reference genome selection. Across multiple taxa, there was a strong inverse relationship between pipeline sensitivity and precision, and the Mash distance (a proxy for average nucleotide divergence) between reads and reference genome. The effect was especially pronounced for diverse, recombinogenic, bacteria such as Escherichia coli, but less dominant for clonal species such as Mycobacterium tuberculosis. Conclusions: The accuracy of SNP calling for a given species is compromised by increasing intra-species diversity. When reads were aligned to the same genome from which they were sequenced, among the highest performing pipelines was Novoalign/GATK. By contrast, when reads were aligned to particularly divergent genomes, the highest-performing pipelines often employed the aligners NextGenMap or SMALT, and/or the variant callers LoFreq, mpileup or Strelka

    Extended Sequence Typing of Campylobacter spp., United Kingdom

    Get PDF
    Supplementing Campylobacter spp. multilocus sequence typing with nucleotide sequence typing of 3 antigen genes increased the discriminatory index achieved from 0.975 to 0.992 among 620 clinical isolates from Oxfordshire, United Kingdom. This enhanced typing scheme enabled identification of clusters and retained data required for long-range epidemiologic comparisons of isolates

    Evaluation of Nanopore sequencing for Mycobacterium tuberculosis drug susceptibility testing and outbreak investigation: a genomic analysis

    Get PDF
    Mycobacterium tuberculosis whole-genome sequencing (WGS) has been widely used for genotypic drug susceptibility testing (DST) and outbreak investigation. For both applications, Illumina technology is used by most public health laboratories; however, Nanopore technology developed by Oxford Nanopore Technologies has not been thoroughly evaluated. The aim of this study was to determine whether Nanopore sequencing data can provide equivalent information to Illumina for transmission clustering and genotypic DST for M tuberculosis
    corecore